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Abstract. SPARQL is the W3C candidate recommendation query lan- 
guage for RDF. In this paper we address systematically the formal study 
of SPARQL, concentrating in its graph pattern facility. We consider for 
this study a fragment without literals and a simple version of filters which 
encompasses all the main issues yet is simple to formalize. We provide a 
compositional semantics, prove there are normal forms, prove complex- 
ity bounds, among others that the evaluation of SPARQL patterns is 
PSPACE-complete, compare our semantics to an alternative operational 
semantics, give simple and natural conditions when both semantics co- 
incide and discuss optimizations procedures. 



1 Introduction 

The Resource Description Framework (RDF) [14] is a data model for represent- 
ing information about World Wide Web resources. Jointly with its release in 
1998 as Recommendation of the W3C, the natural problem of querying RDF 
data was raised. Since then, several designs and implementations of RDF query 
languages have been proposed (see [11] for a recent survey). In 2004 the RDF 
Data Access Working Group (part of the Semantic Web Activity) released a first 
public working draft of a query language for RDF, called SPARQL [16], whose 
specification does not include RDF Schema. Currently (April 2006) SPARQL is 
a W3C Candidate Recommendation. 

Essentially, SPARQL is a graph-matching query language. Given a data 
source D, a query consists of a pattern which is matched against D, and the 
values obtained from this matching are processed to give the answer. The data 
source D to be queried can be composed of multiple sources. A SPARQL query 
consists of three parts. The pattern matching part, which includes several in- 
teresting features of pattern matching of graphs, like optional parts, union of 
patterns, nesting, filtering (or restricting) values of possible matchings, and the 
possibility of choosing the data source to be matched by a pattern. The solu- 
tion modifiers, which once the output of the pattern is ready (in the form of 
a table of values of variables), allows to modify these values applying classical 
operators like projection, distinct, order, limit, and offset. Finally, the output of 
a SPARQL query can be of different types: yes/no queries, selections of values of 
the variables which match the patterns, construction of new triples from these 
values, and descriptions about resources queries. 



Although taken one by one the features of SPARQL arc simple to describe 
and understand, it turns out that the combination of them makes SPARQL into 
a complex language, whose semantics is far from being understood. In fact, the 
semantics of SPARQL currently given in the document [16], as we show in this 
paper, does not cover all the complexities brought by the constructs involved in 
SPARQL, and includes ambiguities, gaps and features difficult to understand. 
The interpretations of the examples and the semantics of cases not covered in 
[16] are currently matter of long discussions in the W3C mailing lists. 

The natural conclusion is that work on formalization of the semantics of 
SPARQL is needed. A formal approach to this subject is beneficial for several 
reasons, including to serve as a tool to identify and derive relations among the 
constructors, identify redundant and contradicting notions, and to study the 
complexity, expressiveness, and further natural database questions like rewrit- 
ing and optimization. To the best of our knowledge, there is no work today ad- 
dressing this formalization systematically. There are proposals addressing partial 
aspects of the semantics of some fragments of SPARQL. There are also works 
addressing formal issues of the semantics of query languages for RDF which can 
be of use for SPARQL. In fact, SPARQL shares several constructs with other 
proposals of query languages for RDF. In the related work section, we discuss 
these developments in more detail. None of these works, nevertheless, covers 
the problems posed by the core constructors of SPARQL from the syntactic, 
semantic, algorithmic and computational complexity point of view, which is the 
subject of this paper. 

Contributions An in depth analysis of the semantics benefits from abstracting 
some features, which although relevant, in a first stage tend to obscure the inter- 
play of the basic constructors used in the language. One of our main goals was 
to isolate a core fragment of SPARQL simple enough to be the subject matter 
of a formal analysis, but which is expressive enough to capture the core com- 
plexities of the language. In this direction, we chose the graph pattern matching 
facility, which is additionally one of the most complex parts of the language. The 
fragment isolated consists of the grammar of patterns restricted to queries on 
one dataset (i.e. not considering the dataset graph pattern) over RDF without 
vocabulary of RDF Schema and literals. There are other two sources of abstrac- 
tions which do not alter in essential ways SPARQL: we use set semantics as 
opposed to the bag semantics implied in the document of the W3C, and we 
avoid blanks in the syntax of patterns, because in our fragment can be replaced 
by variables [10, 5]. 

The contributions of this paper are: 

— A streamlined version of the core fragment of SPARQL with precise Syntax 
and Semantics. A formal version of SPARQL helps clarifying cases where 
the current english-wording semantics gives little information, identify areas 
of problems and permits to propose solutions. 

— We present a compositional semantics for patterns in SPARQL, prove that 
there is a notion of normal form for graph patterns in the fragment consid- 



ered, and indicate optimization procedures and rules for the operators based 
on them. 

— We give thorough analysis of the computational complexity of the fragment. 
Among other bounds, we prove that the complexity of evaluation of a general 
graph pattern in SPARQL is PSPACE-complete even if we not consider filter 
conditions. 

— We formalize a natural procedural semantics which is implicitly used by de- 
velopers. We compare these two semantics, the operational and the compo- 
sitional mentioned above. We show that putting some slight and reasonable 
syntactic restrictions on the scope of variables, they coincide, thus isolat- 
ing a natural fragment having a clear semantics and an efficient evaluation 
procedure. 

1.1 Related Work 

Works on the SPARQL semantics. A rich source on the intended semantics of the 
constructors of SPARQL are the discussions around W3C document [16], which 
is still in the stage of Candidate Recommendation. Nevertheless, systematic and 
comprehensive approaches to define the semantics are not present, and most of 
the discussion is based on use cases. 

Cyganiak [4] presents a relational model of SPARQL. The author uses rela- 
tional algebra operators (join, left outer join, projection, selection, etc.) to model 
SPARQL SELECT clauses. The central idea in [4] is to make a correspondence 
between SPARQL queries and relational algebra queries over a single relation 
T(S, P, O). Indeed a translation system between SPARQL and SQL is outlined. 
The system needs extensive use of COALESCE and IS NULL operations to resemble 
SPARQL features. The relational algebra operators and their semantics in [4] 
are similar to our operators and have similar syntactic and semantic issues. With 
different motivations, but similar philosophy, Harris [12] presents an implemen- 
tation of SPARQL queries in a relational database engine. He uses relational 
algebra operators similar to [4]. This line of work, which models the semantics 
of SPARQL based on the semantics of some relational operators, seems to be 
very influent in the decisions on the W3C semantics of SPARQL. 

De Bruin et al. [5] address the definition of mapping for SPARQL from a 
logical point of view. It slightly differs from the definition in [16] on the issue of 
blank nodes. Although De Bruin et al.'s definition allows blank nodes in graph 
patterns, it is similar to our definition which does not allow blanks in patterns. 
In their approach, these blanks play the role of "non-distinguished" variables, 
that is, variables which are not presented in the answer. 

Franconi and Tessaris [6], in an ongoing work on the semantics of SPARQL, 
formally define the solution for a basic graph pattern (an RDF graph with vari- 
ables) as a set of partial functions. They also consider RDF datasets and sev- 
eral forms of RDF entailment. Finally, they propose high level operators (Join, 
Optional, etc.) that take set of mappings and give set of mappings, but currently 
they do not have formal definitions for them, stating only their types, i.e., the 
domain and codomain. 



Works on semantics of RDF query languages. There are several works on the 
semantics of RDF query languages which tangentially touch the issues addressed 
by SPARQL. Gutierrez et al. [10] discuss the basic issues of the semantics and 
complexity of a conjunctive query language for RDF with basic patterns which 
underlies the basic evaluation approach of SPARQL. 

Haase et al. [11] present a comparison of functionalities of pre-SPARQL query 
languages, many of which served as inspiration for the constructs of SPARQL. 
There is, nevertheless, no formal semantics involved. 

The idea of having an algebraic query language for RDF is not new. In 
fact, there are several proposals. Chen et al. [3] present a set of operators for 
manipulating RDF graphs, Frasincar et al. [7] study algebraic operators on the 
lines of the RQL query language, and Robertson [17] introduces an algebra 
of triadic relations for RDF. Although they evidence the power of having an 
algebraic approach to query RDF, the frameworks presented in each of these 
works makes not evident how to model with them the constructors of SPARQL. 

Finally, Serfiotis et al. [19] study RDFS query fragments using a logical frame- 
work, presenting results on the classical database problems of containment and 
minimization of queries for a model of RDF/S. They concentrate on patterns 
using the RDF/S vocabulary of classes and properties in conjunctive queries, 
making the overlap with our fragment and approach almost empty. 

Organization of the paper The rest of the paper is organized as follows. Sec- 
tion 2 presents a formalized algebraic syntax and a compositional semantics for 
SPARQL. Section 3 presents the complexity study of the fragment considered. 
Section 4 presents and in depth discussion of graph patterns not including the 
UNION operator. Finally, Section 5 presents some conclusions. Appendix A con- 
tains detailed proofs of all important results. 

2 Syntax and Semantics of SPARQL 

In this section, we give an algebraic formalization of the core fragment of SPARQL 
over simple RDF, that is, RDF without RDFS vocabulary and literal rules. This 
allows us to take a close look at the core components of the language and identify 
some of its fundamental properties (for details on RDF formalization sec [10], 
or [15] for a complete reference including RDFS vocabulary). 

Assume there are pairwise disjoint infinite sets I, B, and L (IRIs, Blank 
nodes, and RDF literals, respectively). A triple (v\, v%, V3) 6 (7 U B) x I X (/ U 
B U L) is called an RDF triple. In this tuple, v\ is the subject, V2 the predicate 
and V3 the object. We denote by T the union JUBUI. Assume additionally the 
existence of an infinite set V of variables disjoint from the above sets. 

Definition 1. An RDF graph [13] is a set of RDF triples. In our context, we 
refer to an RDF graph as an RDF datasct ; or simply a dataset. 



2.1 Syntax of SPARQL graph pattern expressions 

In order to avoid ambiguities in the parsing, we present the syntax of SPARQL 
graph patterns in a more traditional algebraic way, using the binary operators 
UNION, AND and OPT, and FILTER. We fully parenthesize expressions and 
make explicit the left associativity of OPTIONAL and the precedence of AND over 
OPTIONAL implicit in [16]. 

A SPARQL graph pattern expression is defined recursively as follows: 

(1) A tuple from (TUV) x (I UV) x (TUV) is a graph pattern (a triple pattern). 

(2) If P 1 and P 2 are graph patterns, then expressions (Pi AND P 2 ). {Pi OPT P 2 ), 
and (Pi UNION P 2 ) are graph patterns. 

(3) If P is a graph pattern and R is a SPARQL built-in condition, then the 
expression (P FILTER R) is a graph pattern. 

A SPARQL built-in condition is constructed using elements of the set VUT and 
constants, logical connectives (->, A, V), inequality symbols (<, <, >, >), the 
equality symbol (=), unary predicates like bound, isBlank, and isIRI, plus other 
features (see [16] for a complete list). 

In this paper, we restrict to the fragment of filters where the built-in condition 
is a boolean combination of terms constructed by using = and bound, that is: 

(1) If ?X,?Y e V and ce/Ul, then bound(?A), IX = c and IX =?Y are 
built-in conditions. 

(2) If Pi and R 2 are built-in conditions, then (-1P1), (Pi V R 2 ) and (Pi A P2) 
are built-in conditions. 

Additionally, we assume that for (P FILTER P) the condition var(P) C var(P) 
holds, where var(P) and var(P) are the sets of variables occurring in P and 
P, respectively Variables in P not occurring in P bring issues that are not 
computationally desirable. Consider the example of a built in condition P defined 
as IX =?Y for two variables not occurring in P. What should be the result of 
evaluating (P FILTER P)? We decide not to address this discussion here. 

2.2 Semantics of SPARQL graph pattern expressions 

To define the semantics of SPARQL graph pattern expressions, we need to intro- 
duce some terminology. A mapping fj. from V to T is a partial function /i : V — > T '. 
Abusing notation, for a triple pattern t we denote by /i(t) the triple obtained 
by replacing the variables in t according to /1. The domain of //, dom(^), is the 
subset of V where /1 is defined. Two mappings [i\ and /j, 2 are compatible when for 
all x e dom(/i!) n dom(^ 2 ) 5 it is the case that /J,i(x) = ijl 2 (x), i.e. when [i\ U [x 2 
is also a mapping. Note that two mappings with disjoint domains are always 
compatible, and that the empty mapping (i.e. the mapping with empty domain) 
[i% is compatible with any other mapping. Let fi\ and fl 2 be sets of mappings. 
We define the join of, the union of and the difference between fl\ and Q 2 as: 

fix x S7 2 = {/ii U fi 2 I fix G J?i, [i 2 £ S7 2 are compatible mappings}, 

J?i U J?2 = it 1 I t 1 £ ^1 or M G ^2}, 

i?i \ J?2 = {ti £ &i I for all // S O2, {J, and p! are not compatible}. 



Based on the previous operators, we define the left outer-join as: 



J?i M f2 2 = (&i n 2 ) U (i7i \ Q- 



h). 

We are ready to define the semantics of graph pattern expressions as a function 
[[ • ]]d which takes a pattern expression an returns a set of mappings. We follow 
the approach in [10] defining the semantics as the set of mappings that matches 
the dataset D. For simplicity, in this work we assume all datasets are already 
lean, i.e. (for simple RDF graphs) this means they do not have redundancies, 
which as is proved in [10], ensures that the property that for all patterns and 
datasets, if D = D' then [[P]]d = [[P]]z)'- This issue is not discussed in [16]. 



Definition 2. Let D be an RDF dataset over T, t a triple pattern and P\ 1 P 2 
graph patterns. Then the evaluation of a graph pattern over D, denoted by [[ • ]]£>, 
is defined recursively as follows: 

(1) [[t]]D = {fJ- I dom(/i) = var(i) and /i(t) £E D}, where var(i) is the set of 
variables occurring in t. 

(2) [[(Pi AND P 2 )]] D = [[P^d ix [[P 2 ]] D . 

(3) [[(Pi OPT P 2 )J D = [[Pi]] D * [\P 2 ]] D . 

(4) [[(Pi UNION P 2 )]] D = IPjJd U [[P 2 ]] d . 

The semantics of FILTER expressions goes as follows. Given a mapping /x and 
a built-in condition R, we say that fj, satisfies R, denoted by [i \= R, if: 

(1) R is bound(?X) and IX 6 dom(^); 

(2) R is IX = c, IX e dom(/i) and /Lt(?X) = c; 

(3) R is IX =?y, IX e dom(n), ?Y e dom(/i) and fi(?X) = fi(?Y); 

(4) R is (-Pi), Pi is a built-in condition, and it is not the case that /x |= Pi; 

(5) P is (Pi V P2), Pi and R 2 are built-in conditions, and /x |= Pi or fj, \= R 2 ; 

(6) P is (Pi A P2), Pi and R 2 are built-in conditions, /j, \= R\ and /i |= R 2 . 

Definition 3. Given an RDF dataset D and a FILTER expression (P FILTER P), 

[[(P FILTER R)]] D = {/i£ {[P]] D \/2^R}. 
Example 1. Consider the RDF dataset D: 



D — { (Pi, name, paul), 
(P2, name, john), 
(P 3 , name, george), 
(B4, name, ringo), 
(B4, webPage, www.starr.edu), 



(Pi, phone, 777-3426), 
(P2, email, john@acd.edu), 
(P3, webPage, www.george.edu), 
(B4, email, ringo@acd.edu), 
(B 4 , phone, 888-4537), } 



The following are graph pattern expressions and their evaluations over D ac- 
cording to the above semantics: 

(1) Pi = ((?A, email, IE) OPT (?A, webPage, 1W)). Then 

[[Pi]] D = Ml 



?A 


?E 


?W 


B 2 
B 4 


john@acd.edu 
ringo@acd.edu 


www .starr.edu 



(2) P 2 = {{{?A, name, ?A) OPT (?A, email, IE)) OPT (?A, webPage, 1W)). 
Then 





1A 


IN 


IE 


?W 




Bi 


paul 






: 


B 2 


john 


john@acd.edu 




: 


B 3 


george 




www.george.edu 


fl4 ■ 


Bi 


ringo 


ringo@acd.edu 


www .starr.edu 



(3) P 3 = {{?A, name, ?JV) OPT {{?A, email, ?£) OPT {?A, webPage, 1W))). 
Then 





1A 


?JV 


IE 


m 




Si 


paul 






: 


P 2 


john 


john@acd.edu 




M3 : 


B 3 


george 






fl4 ■ 


B 4 


ringo 


ringo@acd.edu 


www.starr.edu 



Note the difference between [[P 2 ]]r> an d P"3]].d- These two examples show 
that [[((A OPT B) OPT C)}] D ^ [[(A OPT (P OPT C))]] D in general. 
(4) P 4 = ((?A, name, ?iV) AND ((?A, email, ?£) UNION webPage, ?W))). 
Then 





?A 


IN 


IE 


1W 


Mi : 


B 2 


john 


john@acd.edu 




M2 : 


B 3 


george 




www . george . edu 


M3 : 


B 4 


ringo 


ringo@acd.edu 




/i4 : 


Bi 


ringo 




www.starr.edu 



(5) P 5 = (((?A, name, ?N) OPT (?A, phone, ?P)) FILTER ?P =777-3426). 
Then 



?4 


?iV 


IP 


Si 


paul 


777-3426 



2.3 A simple normal form for graph patterns 

We say that two graph pattern expressions Pi and Pi are equivalent, denoted 
by Pi = P 2 , if [[Pi]]d = [[P2]]d for every RDF dataset D. 

Proposition 1. Let Pi, P 2 and P3 be graph pattern expressions and R a built-in 
condition. Then: 

(1) AND and UNION are associative and commutative. 

(2) (Pi AND (P 2 UNION P 3 )) = ((P X AND P 2 ) UNION (Pi AND P 3 )). 

(3) (Pi OPT (P 2 UNION P 3 )) = ((Pi OPT P 2 ) UNION (P x OPT P 3 )). 

(4) ((Pi UNION P 2 ) OPT P 3 ) ee ((Pi OPT P 3 ) UNION (P 2 OPT P 3 )). 

(5) ((Pi UNION P 2 ) FILTER P) ee ((Pi FILTER R) UNION (P 2 FILTER P)). 

The application of the above equivalences permits to translate any graph pattern 
into an equivalent one of the form: 

Pi UNION P 2 UNION P 3 UNION ••• UNION P„, (1) 

where each Pj (1 < i < n) is a UNION- free expression. In Section 4, we study 
UNION-free graph pattern expressions. 



3 Complexity of Evaluating Graph Pattern Expressions 

A fundamental issue in any query language is the complexity of query evaluation 
and, in particular, what is the influence of each component of the language in this 
complexity. In this section, we address these issues for graph pattern expressions. 

As it is customary when studying the complexity of the evaluation problem 
for a query language, we consider its associated decision problem. We denote 
this problem by Evaluation and we define it as follows: 

INPUT : An RDF dataset D, a graph pattern P and a mapping \i. 
QUESTION : Is fi e [P]d? 

We start this study by considering the fragment consisting of graph pattern 
expressions constructed by using only AND and FILTER operators. This simple 
fragment is interesting as it does not use the two most complicated operators in 
SPARQL, namely UNION and OPT. Given an RDF dataset D, a graph pattern 
P in this fragment and a mapping fi, it is possible to efficiently check whether 
/U G [-P]]d by using the following algorithm. First, for each triple t in P, verify 
whether fi(t) S D. If this is not the case, then return false. Otherwise, by using 
a bottom-up approach, verify whether the expression generated by instantiating 
the variables in P according to /x satisfies the FILTER conditions in P. If this 
is the case, then return true, else return false. Thus, we conclude that: 

Theorem 1. Evaluation can be solved in time 0(\P\ ■ \D\) for graph pattern 
expressions constructed by using only AND and FILTER operators. 

We continue this study by adding to the above fragment the UNION operator. 
It is important to notice that the inclusion of UNION in SPARQL is one of the 
most controversial issues in the definition of this language. In fact, in the W3C 
candidate recommendation for SPARQL [16], one can read the following: "The 
working group decided on this design and closed the disjunction issue without 
reaching consensus. The objection was that adding UNION would complicate 
implementation and discourage adoption". In the following theorem, we show 
that indeed the inclusion of UNION operator makes the evaluation problem for 
SPARQL considerably harder: 

Theorem 2. Evaluation is NP-complete for graph pattern expressions con- 
structed by using only AND, FILTER and UNION operators. 

We conclude this study by adding to the above fragments the OPT operator. 
This operator is probably the most complicated in graph pattern expressions 
and, definitively, the most difficult to define. The following theorem shows that 
the evaluation problem becomes even harder if we include the OPT operator: 

Theorem 3. Evaluation is P SPACE- complete for graph pattern expressions. 

It is worth mentioning that in the proof of Theorem 3, we actually show that 
Evaluation remains PSPACE-complctc if we consider expressions without FIL- 
TER conditions, showing that the main source of complexity in SPARQL comes 
from the combination of UNION and OPT operators. 



When verifying whether fi £ [[Pju, it is natural to assume that the size of P 
is considerably smaller that the size of D. This assumption is very common when 
studying the complexity of a query language. In fact, it is named data-complexity 
in the database literature [20] and it is defined as the complexity of the evaluation 
problem for a fixed query. More precisely for the case of SPARQL, given a graph 
pattern expression P, the evaluation problem for P, denoted by Evaluation(P), 
has as input an RDF dataset D and a mapping [i, and the problem is to verify 
whether /i G [[-P]]z> From known results for the data-complexity of first-order 
logic [20], it is easy to deduce that: 

Theorem 4. Evaluation(P) is in LOGSPACE for every graph pattern ex- 
pression P. 

4 On the Semantics of UNION-free Pattern Expressions 

The exact semantics of graph pattern expressions has been largely discussed on 
the mailing list of the W3C. There seems to be two main approaches proposed to 
compute answers to a graph pattern expression P. The first uses an operational 
semantics and consists essentially in the execution of a depth-first traversal of 
the parse tree of P and the use of the intermediate results to avoid some compu- 
tations. This approach is the one followed by ARQ [1] (a language developed by 
HPLabs) in the cases we test, and by the W3C when evaluating graph pattern 
expressions containing nested optionals [18]. For instance, the computation of 
the mappings satisfying (^4 OPT (B OPT C)) is done by first computing the 
mappings that match A, then checking which of these mappings match B, and 
for those who match B checking whether they also match C [18]. The second 
approach, compositional in spirit and the one we advocate here, extends classical 
conjunctive query evaluation [10] and is based on a bottom up evaluation of the 
parse tree, borrowing notions of relational algebra evaluation [4, 12] plus some 
additional features. 

As expected, there are queries for which both approaches do not coincide 
(see Section 4.1 for examples). However, both semantics coincide in most of 
the "real-life" examples. For instance, for all the queries in the W3C candidate 
recommendation for SPARQL, both semantics coincide [16]. Thus, a natural 
question is what is the exact relationship between the two approaches mentioned 
above and, in particular, whether there is a "natural" condition under which 
both approaches coincide. In this section, we address these questions: Section 4.1 
formally introduces the depth-first approach, discusses some issues concerning 
it, and presents queries for which the two semantics do not coincide; Section 4.2 
identifies a natural and simple condition under which these two semantics are 
equivalent; Section 4.3 defines a normal form and simple optimization procedures 
for patterns satisfying the condition of Section 4.2 

Based on the results of Section 2.3, we concentrate in the critical fragment 
of UNION-free graph pattern expressions. 



4.1 A depth- first approach to evaluate graph pattern expressions 

As we mentioned earlier, one alternative to evaluate graph pattern expressions 
is based on a "greedy" approach that computes the mappings satisfying a graph 
pattern expression P by traversing the parse tree of P in a depth-first manner 
and using the intermediate results to avoid some computations. This evaluation 
includes at each stage three parameters: the dataset, the subtree pattern of P to 
be evaluated, and a set of mappings already collected. Formally, given an RDF 
dataset D, the evaluation of pattern P with the set of mappings f2, denoted by 
Evalu(P, H), is a recursive function defined as follows: 

Evalo{P'- graph pattern expression, fl: set of mappings) 
if Q = then return(0) 

if P is a triple pattern t then return(J? n [[i]] _d ) 

if P = (Pi AND P 2 ) then return Eval D (P 2l Eval D (P ll f2)) 

if P = (Pi OPT P 2 ) then return Eval D (P x ,Q) M Eval D (P 2l Eval D (Pi, fl)) 

if P = (Pi FILTER R) then return {/i 6 Eval D (P x ,Q) \ n \= R} 

Then, the evaluation of P against a dataset D, which we denote simply by 
Eval]j(P) 1 is Evalu(P, {/i0}), where fj.$ is the mapping with empty domain. 

Example 2. Assume that P = (ti OPT (t 2 OPT t 3 )), where h, t 2 and t 3 
are triple patterns. To compute Evaljj(P), we invoke function Eval]j(P, {fi$}). 
This function in turn invokes function Evalo(ti, {^0}), which returns Pi]]d 
since ti is a triple pattern and pi]]r> x {^0} = Pi]]-Di an d then it invokes 
Eval D ((t 2 OPT t 3 ), [[ti]] D ). As in the previous case, Evalrj((t 2 OPT t 3 ), Pi]]d) 
first invokes EvalD(t 2 , Pi]]d), which returns [[ti]]D x P2I.D since t 2 is a triple 
pattern, and then it invokes Evalr>(t 3l Pi]]_d x p2]]i))- Since t 3 is a triple pat- 
tern, the latter invocation returns Pi]]d x p2]]r> x p3]]-D- Thus, by the def- 
inition of Evalo we have that Evalo((t 2 OPT t 3 ), pi]]r>) returns (Pi]]d x 
P 2 ]]d) * (pijr> x Wu x PsId)- Therefore, Eval D (P) returns 

Pile * ((Pile x 1Mb) * (Pi]]d x P 2 ]]d n pale)). 

Note that the previous result coincides with the evaluation algorithm proposed 
by the W3C for graph pattern (ti OPT (t 2 OPT t 3 )) [18], as we first compute 
the mappings that match ti, then we check which of these mappings match 
t 2 , and for those who match t 2 we check whether they also match £3. Also 
note that the result of Evaln(P) is not necessarily the same as [[P]]d since 
[(ti OPT (t a OPT t 3 ))}} D = [Md 3* (pale W pa]i?). In Example 3 we actually 
show a case where the two semantics do not coincide. 

Some issues on the depth-first approach There are two relevant issues to con- 
sider when using the depth- first approach to evaluate SPARQL queries. First, 
this approach is not compositional. For instance, the result of Evalo(P) can- 
not in general be used to obtain the result of Evaln((P' OPT P)), or even 
the result of Evalo((P' AND Pj), as Evalo(P) results from the computation 



of Evalo(P, {^0}) while Evalu((P' OPT P)) results from the computation of 
J? = Evo,Id{P', {^0}) and Evalo(P, This can become a problem in cases 
of data integration where global answers are obtained by combining the results 
from several data sources; or when storing some pre-answered queries in order 
to obtain the results of more complex queries by composition. Second, under the 
depth-first approach some natural properties of widely used operators do not 
hold, which may confuse some users. For example, it is not always the case that 
Eval D ((P 1 AND P 2 )) = Eval D ((P 2 AND P{)), violating the commutativity of 
the conjunction and making the result to depend on the order of the query. 

Example 3. Let D be the RDF dataset shown in Example 1 and consider the pat- 
tern P = ((IX, name, paul) OPT ((?Y~, name, gcorge) OPT (IX, email, 1Z))). 
Then [[P]]d = { {?A — > -Bi}}, that is, [[P]]d contains only one mapping. 
On the other hand, following the recursive definition of Evaljj we obtain that 
Eval D (P) = { {IX -> B U ?Y -> B 3 } }, which is different from [[P]] D . 

Example 4 (Not commutativity of AND). Let D be the RDF dataset in Example 
f, P x = ((IX, name, paul) AND ((1Y, name, georgc) OPT (IX, email, ?Z))) 
and P 2 = (((?Y, name, george) OPT (IX, email, ?Z)) AND (IX, name, paul)). 
Then Eval D (P 1 ) = {{IX -> B 1 ,W -> B 3 } } while Eval D (P 2 ) = 0. Using the 
compositional semantics, we obtain [[-Pi]]r> = P"2]]d = 0- 

Let us mention that ARQ [1] gives the same non-commutative evaluation. 

4.2 A natural condition ensuring [[-P]]r> = Evalo(P) 

If for a pattern P we have that [[P]]d = Evalj)(P) for every RDF dataset D, 
then we have the best of both worlds for P as the compositional approach gives 
a formal semantics to P while the depth-first approach gives an efficient way of 
evaluating it. Thus, it is desirable to identify natural syntactic conditions on P 
ensuring [[-P]]r> = Evalo(P)- In this section, we introduce one such condition. 

One of the most delicate issues in the definition of a semantics for graph 
pattern expressions is the semantics of OPT operator. A careful examination of 
the conflicting examples reveals a common pattern: A graph pattern P mentions 
an expression P' = (Pi OPT P 2 ) and a variable IX occurring both in P 2 and 
outside P' but not occurring in Pi. For instance, in the graph pattern expression 
shown in Example 3: 

P = ((IX, name, paul) OPT ((?Y, name, george) OPT (IX, email, 7Z))), 

the variable IX occurs both in the optional part of the sub-pattern P' = ((?Y, 
name, george) OPT (IX, email, ?Z)) and outside P' in the triple (IX , name, 
paul), but it is not mentioned in (1Y , name, george). 

What is unnatural about graph pattern P is the fact that (?A", email, 1Z) is 
giving optional information for (IX, name, paul) but in P appears as giving op- 
tional information for (? Y, name, george) . In general, graph pattern expressions 
having the condition mentioned above are not natural. In fact, no queries in the 
W3C candidate recommendation for SPARQL [16] exhibit this condition. This 
motivates the following definition: 



Definition 4. A graph pattern P is well designed if for every occurrence of a 
sub-pattern P' = (Pi OPT P2) of P and for every variable IX occurring in P, 
the following condition holds: 

if IX occurs both in Pi and outside P', then it also occurs in P\. 

Graph pattern expressions that are not well designed are shown in Examples 3 
and 4. For all these patterns, the two semantics differ. The next result shows 
a fundamental property of well-designed graph pattern expressions, and is a 
welcome surprise as a very simple restriction on graph patterns allows the users 
of SPARQL to alternatively use any of the two semantics shown in this section: 

Theorem 5. Let D be an RDF dataset and P a well-designed graph pattern 
expression. Then EvclId(P) = [[P]]_d- 

4.3 Well-designed patterns and normalization 

Due to the evident similarity between certain operators of SPARQL and rela- 
tional algebra, a natural question is whether the classical results of normal forms 
and optimization in relational algebra are applicable in the SPARQL context. 
The answer is not straightforward, at least for the case of optional patterns and 
its relational counterpoint, the left outer join. The classical results about outer 
join query reordering and optimization by Galindo-Lcgaria and Rosenthal [8] are 
not directly applicable in the SPARQL context because they assume constraints 
on the relational queries that are rarely found in SPARQL. The first and more 
problematic issue, is the assumption on predicates used for joining (outer join- 
ing) relations to be null-rejecting [8]. In SPARQL, those predicates are implicit 
in the variables that the graph patterns share and by the definition of compatible 
mappings they are never null-rejecting. In [8] the queries are also enforced not to 
contain Cartesian products, situation that occurs often in SPARQL when joining 
graph patterns that do not share variables. Thus, specific techniques must be 
developed in the SPARQL context. 

In what follows we show that the property of a pattern being well designed 
has important consequences for the study normalization and optimization for a 
fragment of SPARQL queries. We will restrict in this section to graph patterns 
without FILTER. 

We start with equivalences that hold between sub-patterns of well-designed 
graph patterns. 

Proposition 2. Given a well-designed graph pattern P, if the left hand sides of 
the following equations are sub-patterns of P , then: 



(Pi AND (P 2 OPT P 3 )) = ((Pi AND P 2 ) OPT P 3 ), 
((Pi OPT P 2 ) OPT P 3 ) = ((Pi OPT P 3 ) OPT P 2 ). 



(2) 
(3) 



Moreover, in both equivalences, if one replaces in P the left hand side by the 
right hand side, then the resulting pattern is still well designed. 



From this proposition plus associativity and commutativity of AND, it follows: 

Theorem 6. Every well-designed graph pattern P is equivalent to a pattern in 
the following normal form: 

(• • • (ti AND • • • AND t k ) OPT O x ) OPT 2 ) ■ • • ) OPT O n ), (4) 

where each ti is a triple pattern, n > and each Oj has the same form (4) . 

The proof of the theorem is based on term rewriting techniques. The next ex- 
ample shows the benefits of using the above normal form. 

Example 5. Consider dataset D of Example 1 and well-designed pattern P = 
(((?X, name,?T) OPT (IX, email, IE)) AND (IX, phone, 888-4537)). The nor- 
malized form of P is P' = (((IX, name, ?Y) AND (IX, phone, 888-4537)) OPT 
(IX, email, ?E)). The advantage of evaluating P' over P follows from a simple 
counting of maps. 

Two examples of implicit use of the normal form. There are implementations 
(not ARQ[1]) that do not permit nested optionals, and when evaluating a pat- 
tern they first evaluate all patterns that are outside optionals and then extend 
the results with the matchings of patterns inside optionals. That is, they are 
implicitly using the normal form mentioned above. In [4], when evaluating a 
graph pattern with relational algebra, a similar assumption is made. First the 
join of all triple patterns is evaluated, and then the optional patterns are taken 
into account. Again, this is an implicit use of the normal form. 

5 Conclusions 

The query language SPARQL is in the process of standardization, and in this 
process the semantics of the language plays a key role. A formalization of a se- 
mantics will be beneficial on several grounds: help identify relationships among 
the constructors that stay hidden in the use cases, identify redundant and con- 
tradicting notions, study the expressiveness and complexity of the language, help 
in optimization, etc. 

In this paper, we provided such a formal semantics for the graph pattern 
matching facility, which is the core of SPARQL. We isolated a fragment which 
is rich enough to present the main issues and favor a good formalization. We 
presented a formal semantics, made observations to the current syntax based 
on it, and proved several properties of it. We did a complexity analysis showing 
that unlimited used of OPT could lead to high complexity, namely PSPACE. 
We presented an alternative formal procedural semantics which closely resem- 
bles the one used by most developers. We proved that under simple syntactic 
restrictions both semantics are equivalent, thus having the advantages of a formal 
compositional semantics and the efficiency of a procedural semantics. Finally, we 
discussed optimization based on relational algebra and show limitations based 



on features of SPARQL. On these lines, we presented optimizations based on 
normal forms. 

Further work should concentrate on the extensions of these ideas to the whole 
language and particularly to the extension -that even the current specification 
of SPARQL lacks- to RDF Schema. 
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A Proofs and Intermediate Results 



A.l Some technical results 

Lemma 1. All the following equivalences hold: 

(1) If P is a graph pattern and Pi, P 2 are built-in conditions such that var(Pi) C 
var(P) and var(P 2 ) Q var(P), then 

((P FILTER Pi) FILTER P 2 ) = (P FILTER (Pi A P 2 )), 

(P FILTER (Pi V P 2 )) = ((P FILTER Pi) UNION (P FILTER P 2 )). 

(2) If Pi and P 2 are conjunctions of triple patterns and R is a built-in condition 
such that var(P) C var(Pi), then 

((Pi FILTER P) AND P 2 ) ee ((Pi AND P 2 ) FILTER P). 

Proof: (1.1) Let P/ be an RDF database. Assume first that fi G 
[[((P FILTER Pi) FILTER R 2 )]]d- Then /i G [[(P FILTER Pi)]] D and /i |= P 2 . 
Thus, ^ € [[P]]d, fi \= Ri and (= P 2 - Therefore, /i |= (Pi A P 2 ) and, 
hence, we conclude that fi G [[(P FILTER (Pi A P 2 ))]]d. Now assume that 
H G [[(P FILTER (Pi A P 2 ))]]d- Then G [[P]]d and /x |= (Pi A P 2 ). Thus, 
H G [[P]]d, ^ |= Pi and fi \= P 2 . We conclude that /i G [[(P FILTER Pi)]]d and, 
therefore, given that /i |= P 2 , we have that ^ G [[((P FILTER Pi) FILTER P 2 )]d. 
(1.2) Given an RDF database D, we have that: 

[[(P FILTER (Pi V P 2 ))]] D = {/1 G [PJd | m |= (Pi V P 2 )} 

= {M G Plo I M N i«i or /i h ^2)} 

= {m e [P] I m h fli} u {m e 1 1* N ^2)} 

= [[(P FILTER Pi)]] D U [(P FILTER P 2 )]] D 

= [[((P FILTER Pi) UNION (P FILTER P 2 ))]] D . 

(2) Let D be an RDF database. Assume first that fi G 
[[((Pi FILTER P) AND P 2 )]r>. Then there exist £ti G [[(Pi FILTER P)]] D 
and fj,2 G [[p2]].D such that fii and /x 2 are compatible and fi — fj.1Ufj.2- Since 
A*i G [[(Pi FILTER P)Jd, we have that m G [[P]]d and ^1 |= P. Given that Pi is a 
conjunction of triple patterns and var(P) C var(Pi), we have that fii(7X) is defined 
for every IX G var(P). Thus, given that fii \= R and fj,i is contained in fi, we con- 
clude that fi \= R. Therefore, given that fii G [[Pi]]d and /i 2 G [[P2]]n, we have that 
fi = fii U fi 2 G [[(Pi AND P 2 )}d and, hence, fi G [((Pi AND P 2 ) FILTER R)} D . 
Now assume that fi G [[((Pi AND P 2 ) FILTER P)] D . Then fi \= R and 
A* G [[(Pi AND P 2 )]]d and, therefore, there exist fii G [[Pi]]d and ^2 G [[P2J13 
such that fii and /i 2 are compatible and fi = fii U /x 2 . Given that (Pi AND P 2 ) 
is a conjunction of triple patterns and var(P) C var(Pi) C var((Pi AND P 2 )), 
we have that fi(7X) is defined for every IX G var(P). Moreover, given that 
Pi is a conjunction of triple patterns and var(P) C var(Pi), we have that 
fii{lX) = fi(7X) for every IX G var(P) and, hence, fii \= P. We deduce that 
Hi G [[(Pi FILTER P)J D and, hence, fi = /UiU/i 2 G [[((Pi FILTER P) AND P 2 )]]d- 
This concludes the proof of the equivalence of ((Pi FILTER P) AND P 2 ) and 
((Pi AND P 2 ) FILTER P). □ 



Lemma 2. Let P be a UNION-free graph pattern expression. Then we have that 

(P AND P) = P. 

Proof: Next we show by induction on the structure of P that for every RDF 
database D and pair of mappings fii,fJ.2 £ [PJ.D, if fJi and fi2 are compatible, 
then \i\ = /j.2- It is easy to see that this condition implies that (P AND P) = P. 
If P is a triple pattern, then the property trivially holds. Assume first that P — 
(Pi AND P2), where Pi and P2 satisfy the condition, that is, if £, £ £ [[Pz]]d 
(i = 1,2) and £, ( are compatible, then £ = (. Let ^1 and fi2 be compatible 
mappings in [[P]]r>. Then there exist v\,u\ £ ([PiJd and ^2,^2 £ [[P^d such 
that fii = v\ U uj\ and [12 = 1/2UU2. Given that fi\ and fi2 are compatible, we 
have that v\, V2 are compatible and Wi, o; 2 are compatible. Thus, by induction 
hypothesis we have that i^i = V2 and wi = tJ2 and, hence, fi\ = fJ2- Second, assume 
that P = (Pi OPT P2), and let /ii and fi2 be compatible mappings in [[P]]d. We 
consider four cases. 

(1) If there exist V\,u)i 6 [[Pl]].d and ^2,^2 £ [[-P2]]d such that fii — v\ U loi and 
/12 = f2Uw2, then we conclude that fj\ = fi2 as in the case P — (Pi AND P2). 

(2) If [i\,\i2 £ [[Pi]]-D and both are not compatible with any mapping in [[P2]]d, 
then by induction hypothesis we conclude that \i\ = /12 ■ 

(3) If fi\ £ JPiJu, fti is not compatible with any mapping in [[P 2 ]].d, fJ2 = ^2UtJ2, 
V2 £ [[Pi]]d and 1V2 £ [[P 2 ]].d, then given that fj,i and fi2 are compatible, we have 
that /ii and V2 are compatible. Thus, by induction hypothesis we conclude that 
fii — V2 and, therefore, fii is compatible with L02 £ P^fl-D, which contradicts 
our original assumption. 

(4) If fix — v\ Uwi, v\ £ [[Pl]]d, Wi £ [[P 2 ]]d, fJ-2 £ [[Pi]]r> and /12 is not compatible 
with any mapping in [[P2]]d, then we obtain a contradiction as in the previous 
case. 

Finally, assume that P — (Pi FILTER R), where Pi satisfy the condition. Let fi\ 
and fi2 be compatible mappings in [[P]]d- Then fii £ [[Pi]]d, fii \= R, fJ.2 £ [[P 2 ]].d, 
H2 \= R and, thus, fii = [1,2 by induction hypothesis. This concludes the proof of 
the lemma. □ 



A. 2 Proof of Proposition 1 

(1) Associative and commutative are consequences of the definitions of operators 
AND and UNION. 

(2) To prove that (Pi AND (P 2 UNION P 3 )) = 
((Pi AND P 2 ) UNION (Pi AND P 3 )), we consider two cases. First, we show 
that for every RDF database D, we have that [[(Pi AND (P 2 UNION P 3 ))]]d Q 
[[((Pi AND P 2 ) UNION (Pi AND P 3 ))]r>. Assume that D is an RDF database 
and that fi £ [[(Pi AND (P 2 UNION P 3 ))]d. Then there exists fn £ [[Pi]] D 
and fi2 £ [[(P2 UNION P 3 )]]d such that fj\ and pi 2 are compatible and 
fi = fJ.1Ufj.2- If fJ.2 £ [[P2]]d, then we have that fi = /LtiU^ 2 6 [[(Pi AND P 2 )]]d 
and, therefore, fj. £ [[((Pi AND P 2 ) UNION (Pi AND P 3 ))]] D . Analogously, if 
fJ2 £ [[P 3 ]]d, then we have that fi = fii U/12 £ [[(Pi AND P 3 )]]d and, therefore, 
fi £ [[((Pi AND P 2 ) UNION (Pi AND P 3 ))]] D . Second, we prove that for every 
RDF database D, we have that [[((Pi AND P 2 ) UNION (Pi AND P 3 ))J D C 
[[(Pi AND (P 2 UNION P 3 ))]r>. Assume that D is an RDF database and that 
fj £ [[((Pi AND P 2 ) UNION (Pi AND P 3 ))]] D . Then fi £ [[(P x AND P 2 )]d 



or 6 [[(Pi AND P 3 )]r>- If A* £ [[(Pi AND P 2 )]]d, then we conclude that 
there exists /xi G [[Pi]]d and /X2 € [[P 2 ]].d such that /xi and /X2 are compatible 
and /x = /xi U /x 2 . Since /x 2 £ [[P2]]d, we have that /x 2 G [(P2 UNION Pi)]]d 
and, hence, /x = /xi U /X2 G [[(Pi AND (P 2 UNION P 3 ))]]d. If 
/x G [[(Pi AND P3)Jd, then we conclude that there exists /xi G [[Pi]]d 
and /X3 G [[P3]u such that /xi and /X3 are compatible and /x = /Xi U /X3. 
Since /x 3 G [[PsJId, we have that /x 3 G [[(P 2 UNION P 3 )]]d and, 
therefore, /i = /n U /13 G [[(Pi AND (P 2 UNION P 3 ))]]d. This con- 
cludes the proof of the equivalence of (Pi AND (P 2 UNION P 3 )) and 
((Pi AND P 2 ) UNION (Pi AND P 3 )). 

To prove that (Pi OPT (P 2 UNION P 3 )) = 
((Pi OPT P 2 ) UNION (Pi OPT P 3 )), we consider two cases. First, we show 
that for every RDF database D, we have that [[(Pi OPT (P 2 UNION P 3 ))]d C 
[[((Pi OPT P 2 ) UNION (Pi OPT P 3 ))]] D . Let D be an RDF database and 
assume that /x G [[(Pi OPT (P 2 UNION P 3 ))]]d- Then there exists 
Mi G [[Pi]]d such that either (a) there exists /x 2 G [[(P2 UNION P 3 )]£> 
such that /xi and /X2 are compatible and /x = /xi U /x 2 , or (b) there is 
no fi2 G [[(P2 UNION P3)]]d such that /xi and /X2 are compatible and 
/x = /xi. In case (a), if /x 2 G [[P 2 ]]u, then /x = /Xi U /x 2 G [[(Pi OPT P 2 )]]_d, 
and if /x 2 G [P 3 Jd, then /x = /xi U /x 2 G [[(Pi OPT P 3 )] D . In both 
cases, we conclude that /x G [[((Pi OPT P 2 ) UNION (Pi OPT P 3 ))]r>. 
In case (b), we have that there is no /X2 G [[P2]]d such that /xi and \i2 
are compatible and, hence, /x = /xi G [[(Pi OPT P2)]]d- We conclude that 
/x G [[((Pi OPT P 2 ) UNION (Pi OPT P 3 ))]r>. Second, we show that for every 
RDF database D, we have that [[((Pi OPT P 2 ) UNION (Pi OPT P 3 ))]r> C 
[[(Pi OPT (P 2 UNION P 3 ))]d. Let D be an RDF database and assume 
that tt G [[((Pi OPT P 2 ) UNION (Pi OPT P 3 ))]r>. Then there exists 
/ii G [[Pi]]d such that (a) there exists \±2 G [[P2J.D such that /xi and \i2 are 
compatible and /x = /xi U [12, or (b) there exists /X3 G [[Ps]]d such that /xi 
and /X3 are compatible and /x = /xi U /x 3 , or (c) /x = /xi and there is neither 
fJ.2 G [[p2]]r> compatible with /xi nor /x 3 G [[P3JD compatible with /xi. In 
case (a), given that /t 2 G [[P 2 j]zj, we have that /x 2 G [[(P2 UNION P 3 )]]d 
and, therefore, /x = /xi U /x 2 G [[(Pi OPT (P 2 UNION P 3 ))]]d. In case 
(b), given that /x 3 G [[Pi]]i3, we have that fi 3 G [[(P 2 UNION P 3 )]]i5 and, 
therefore, /x = /ii U fx 3 G [[(Pi OPT (P 2 UNION P 3 ))]d. Finally, in case (c) 
we have that there is no [i! G [[(P2 UNION P3)Jx5 such that [i\ and /x' are 
compatible and, therefore, /x = /xi G [[(Pi OPT (P 2 UNION P 3 ))]u. This 
concludes the proof of the equivalence of (Pi OPT (P 2 UNION P 3 )) and 
((Pi OPT P 2 ) UNION (Pi OPT P 3 )). 

To prove that ((Pi UNION P 2 ) OPT P 3 )) = 
((Pi OPT P 3 ) UNION (P 2 OPT P 3 )), we consider two cases. 
First, we show that for every RDF database D, we have that 
[[((Pi UNION P 2 ) OPT P a )]] D C [[((Pi OPT P 3 ) UNION (P 2 OPT P 3 ))]d. Let 
D be an RDF database and assume that /x G [[((Pi UNION P 2 ) OPT P 3 )]]d- 
Then either (a) there exist /xi G [[(Pi UNION P 2 )]]d and /i 2 G [[P 3 ]]d such 
that /xi and /X2 are compatible and /x = /xi U/X2, or (b) /x G [[(Pi UNION P2)]]d 
and there is no /X3 G [[Ps]]i5 such that /x and fi 3 are compatible. In case 
(a), if /ii G [Pi]]d, then /x = /Xi U /x 2 G [[(Pi OPT P 3 )J D . In case (a), if 
Mi G [[P 2 ]]d, then /x = /xi U /x 2 G [[(P 2 OPT P 3 )] D . In case (b), if /x G [[Pi]] D , 
then /x G [[(Pi OPT P3)]]n since fi is not compatible with any /13 G [[P3]]d- In 



case (b), if fi £ ([P 2 ]]d, then /i £ [[(P 2 OPT P3)]]d since fi is not compatible 
with any ^3 £ [[P3]]r>. In any of the previous cases, we conclude that 
H £ [[((Pi OPT P 3 ) UNION (P 2 OPT P 3 ))]]d. Second, we show that for every 
RDF database D, we have that [[((Pi OPT P 3 ) UNION (P 2 OPT P 3 ))}d C 
[[((Pi UNION P 2 ) OPT P 3 ))]]d. Let D be an RDF database and assume that 
H £ [[((Pi OPT P a ) UNION (P 2 OPT P 3 ))]]d. Without loss of generality, we 
assume that [i £ [[(Pi OPT Ps)]]d- Then either (a) there exists /ii £ [[Pi]]d 
and /x 2 £ [[P3J15 such that [i\ and {12 are compatible and y, = /ii U ft 2 , 
or (b) /x £ [[Pijij and there is no ^3 £ [[Psflri such that y and [13 are 
compatible. In case (a), we have that /ii £ [[(Pi UNION P 2 )]]_d and, hence, 
H = m U H2 £ [[((Pi UNION P 2 ) OPT P 3 )]]d- In case (b), we have that 
/x £ [[(Pi UNION P 2 )Jd and, therefore, fj, £ [[((Pi UNION P 2 ) OPT P 3 )]r> 
since /1 is not compatible with any /13 £ [[P3]]r>. This concludes 
the proof of the equivalence of ((Pi UNION P 2 ) OPT P 3 ) and 
((Pi OPT P 3 ) UNION (P 2 OPT P 3 )). 
(5) Clearly, for every RDF database D and built-in condition R, we have 
that {/j, £ [[Pi]] D fi \= R} C { M £ [[(Pi UNION P 2 )]] D ^ |= P} 
and { M £ [[P 2 ]]d a* N -R} C { M 6 [[(Pi UNION P 2 )]] D | M |= P} 
since [[Pi]] D C [[(Pi UNION P 2 )J D and [[P 2 ]] D C [[(Pi UNION P 2 )]] D . 
Thus, we only need to show that for every RDF database D and built- 
in condition P, it is the case that [[((Pi UNION P 2 ) FILTER R)} D C 
[[((Pi FILTER P) UNION (P 2 FILTER P))]] D . Assume that /x £ 
[[((Pi UNION P 2 ) FILTER P)J D . Then /i £ [[(Pi UNION P 2 )J D and 
^ |= P. Thus, if /i £ [[Pi]]d, then £ [[(Pi FILTER P)]] D , and if 
H £ [[P 2 ]]_d, then fi £ [[(P 2 FILTER P)]] D . Therefore, we conclude that 
H £ [[((Pi FILTER P) UNION (P 2 FILTER P))]]d. 

A.3 Proof of Theorem 2 

It is straightforward to prove that Evaluation is in NP for the case of graph pat- 
tern expressions constructed by using only AND, UNION and FILTER operators. 
To prove the NP-hardness of Evaluation for this case, we show how to reduce 
in polynomial time the satisfiability problem for propositional formulas in CNF 
(SAT-CNF) to our problem. An instance of SAT-CNF is a propositional formula 
ip of the form: 

Ci A...AC, 

where each d (i £ [1, n]) is a clause, that is, a disjunction of propositional variables 
and negations of propositional variables. Then the problem is to verify whether 
there exists a truth assignment satisfying ip. It is known that SAT-CNF is NP- 
complete [9]. 

In the reduction from SAT-CNF, we use a fixed RDF database: 

D = {(a,b,c)} 

Assume that x\, . . ., x m is the list of propositional variables mentioned in ip. For 
each Xi (i £ [l,m]), we use SPARQL variables 7Xi, 1Yi to represent Xi and ->Xi, 
respectively. Then for each clause C in ip of the form: 

Xi 1 V ■ ■ ■ Xi k V ^Xj 1 V ■ ■ ■ ~^Xj e , 



we define a graph pattern Pc as: 



((a,b,?X n ) UNION ••■ UNION (a, b, 7X ik ) UNION 

(a,b, Wji) UNION ■■• UNION (a,b,?Y jf )), 

and we define a graph pattern P v for ip as: 

(P AND ({P Cl AND ■ ■ ■ AND P Cn ) FILTER R)), 

where: 

P = ((a, b, ?Xi) AND • • • AND (a, b, ?X m ) AND 

(a,b,?Yi) AND ■■■ AND (a,b,?Y m )), 
R = ((^bound(?Ai) V -.bound(?Yi)) A ■ ■ • A (^bound(?A m ) V -. bound(?F m ))). 

Let (i = {?Ai — > c, . . . , ?X m — > c, ?Yi — > c, . . . , ?y m — > c}. Then it is straightfor- 
ward to prove that ip is satisfiable if and only if pL £ [[-P^]]d- 

A. 4 Proof of Theorem 3 

Membership in PSPACE is a corollary of the membership in PSPACE of the eval- 
uation problem for first-order logic [20]. 

To prove the PSPACE-hardness of Evaluation for the case of graph pattern ex- 
pressions not containing FILTER conditions, we show how to reduce in polynomial 
time the quantified boolean formula problem (QBF) to our problem. An instance 
of QBF is a quantified propositional formula ip of the form: 

Vxi32/iVa:23?/2Va;33y3 • • • Va; m 3j/ m ip, 

where %j) is a quantifier-free formula of the form Ci A. . .AC n , with each d (i G [1, n]) 
being a disjunction of literals, that is, a disjunction of propositional variables and 
negations of propositional variables. Then the problem is to verify whether cp is 
valid. It is known that QBF is PSPACE-complete [9]. 
In the reduction from QBF, we use a fixed RDF database: 

D — {(a, tv, 0), (a, tv, 1), (a, false, 0), (a, true, 1)}. 

Then for each clause C in ip of the form 

\ i=1 / ^j=l ' 
we define a graph pattern Pc as: 

((a, true, ?Ui) UNION ■■■ UNION (a, true, 1U k ) UNION 

(a, false, ?Vi) UNION ■■■ UNION (a, f alse, Wt)), 

and we define a graph pattern P^ for ip as: 

{P Cl AND ■■■ AND P Cn ). 



It is easy to see that ip is satisfiable if and only if there exists a mapping n G [-P^ Jo- 
in particular, for each mapping [i, there exists a truth assignment a M defined as 
o>(a;) = nC?X) for every variable x in ip, such that /i G [[P/>]]d if and only if cr M 
satisfies ip. 

Now we explain how we represent quantified propositional formula ip as a graph 
pattern expression P v . We use SPARQL variables ?-Xi, . . ., ?X m and ?Yi, . . ., ?Y™ 
to represent propositional variables Xx, . . ., x m and yi, . . ., y m , respectively, and we 
use SPARQL variables 1A , TAx, . . ., lA m , ?B , W x , . . ., ?B m and operators OPT 
and AND to represent the quantifier sequence VxiBj/i ■ ■ ■ Va; m 3j/ m . More precisely, 
for every i G [1, m], we define graph pattern expressions P; and Qi as follows: 

Pi := ((a,tv,?Xi) AND ■■■ AND (a,tv,?Xi) AND 

(a,tv,?Yi) AND ••• AND (a,tv,?Yi_i) AND 

(a, false, ? J 4 i _i) AND (a,true.?ii)), 

Qi := ((a,tv,?Ai) AND ■■■ AND (a,tv,?Xi) AND 

(a,tv,?Yi) AND ■■■ AND (a,tv,?Yi) AND 

(a,false,?Bi_i) AND (a, true, , 

and then we define P v as: 

((a,true,?B () ) OPT (Pi OPT (Qi OPT (P 2 OPT (Q 2 OPT ( ■ ■ ■ 

(P m OPT (Q m AND Pv,)) ■ ■ ■ )))))), 

Next we show that we can use graph expression P v to check whether <p is valid. 
More precisely, we show that p is valid if and only if /i G [[P^Jd, where /i is a 
mapping such that dom(fi) = {?-E>o} and fi(7Bo) = 1. 

(<=) Assume that /j, G [[P^Jd- It is easy to see that [[Piju = {/xo,/ii}, where 
Mo = {?A X -v 0,7 A -» 0,?^ -» 1} and /n = {?Jfi -> 1,?A -» 0,?Ai -» 1}. 
Thus, given that these two mappings are compatible with fi and that fi G flr>, 
there exist mappings v>o and v\ in [[Qi]]d such that no, vo are compatible, /Ui, v\ 
are compatible and 

Ato U vq G [[(Pi OPT (Qi OPT (P 2 OPT (Q 2 OPT ( • ■ • 

P m OPT (Q m AND P^)) ■ ■ ■ )))))]]d, (5) 
/iiUi/i G [[(Pi OPT (Qi OPT (P 2 OPT (Q 2 OPT ( • ■ • 

(P m OPT (Q m AND Py,)) • ■ ■ )))))]]d- (6) 

We note that v {lXx) = (j, (7Xx) = 0, ^i(?Ai) = jtti(?Xi) = and v {Wx), 
vi(?Yi) are not necessarily distinct. 

Since Pi mentions triple (a, true, 1A{) and P 2 mentions triple (a, false, 7 Ax), there 
is no mapping in [[PiJd compatible with some mapping in [[P 2 ]]d. Furthermore, 
since Q\ mentions (a, true, 7Bx) and Q 2 mentions triple (a, f alse, ?-Bi), there is 
no mapping in [[Qi]]d compatible with some mapping in [[Q 2 ]]d. Thus, given that 
(5) holds, for every mapping £ G [[P 2 ]]r>, we have that if vo and £ are compatible, 
then there exist £ G [[Q 2 ]]i5 such that £ and £ are compatible and 

C U £ G [[(P 2 OPT (Q 2 OPT ( • • • (P m OPT (Q m AND P+)) ■ ■ ■ )))]] D . 
There are two mappings in [[P 2 ]]r> which are compatible with vo- 

Moo = {?Ai -» 0, ?A 2 -» 0, Wx fo(?Yi), ?Ai 0, ?A 2 -» 1}, 
Mm = {?Xi -» 0, ?A 2 -» 1, ?Yi i/ (?Yi), ?Ai -» 0, ?A 2 -» 1}. 



Thus, from the previous discussion we conclude that there exist mappings voo and 
Vol such that Moo, ^oo are compatible, poi, foi are compatible and 

Moo U ^oo G [[(P2 OPT (Q 2 OPT ( • • • (P m OPT (Q m AND Py,)) ■ ■ ■ )))]] D , 
Hoi U 1*1 G [[(P 2 OPT (Q 2 OPT ( • • • (P m OPT (Q m AND Py,)) • • • )))]] D . 

Similarly, there are two mapping in [[P 2 ]].d which are compatible with V\: 

Hto = {?Xi -» 1, ?A 2 — 0, ?Ki -> fi(?Yi), ?Ai 0, ?A 2 -> 1}, 
M11 = {?Xi -» 1, ?A 2 -» 1, ?Yi vi(?Yi), ?Ai -» 0, ?A 2 -» 1}. 

Thus, given that (6) holds, we conclude that there exist mappings 1/10 and v\\ such 
that M10, v\o are compatible, /in, i^n are compatible and 

M10 U i/io G [[(P 2 OPT (Q 2 OPT ( • • • (P m OPT (Q m AND P 4 ,)) ■ ■ ■ )))]] D , 
M11 U uii G [[(P 2 OPT (Q 2 OPT ( • ■ ■ (P m OPT (Q m AND P,,)) ■ ■ ■ )))]r>. 

If we continue in this fashion, we conclude that for every i G [2, m — 1] and 
n\ • •■rii G {0, l} 1 , and for the following mappings in JPi+iJu: 

Mm— n 4 o = {?Xx—*m, ?Xi — > m, 7X i+1 — » 0, 

TYi-n/m^yi), .... ?Vi -» ?A i - 1 ^0, ?A, -> 1}, 

Mm— n 4 i = {?Ai— >ni, ?X; — ► raj, ?A" i+ i — > 1, 

?Yi — > ?Fj — > i/ ni — »n(?ii)) ?^-i->0, 

there exist mappings u ni ... ni o and v ni --- ni l in [Qi+ijo such that Mm— ««o> fni— mo 
are compatible, Mm— ml, ^m— m 1 are compatible and 

Mm-moU^„ 1 ...„ i o G [[(P+i OPT (Q i+1 OPT ( • ■ ■ 

(P m OPT {Q m AND Py,)) • ■ ■ )))]] D , 
[[(P+i OPT (Q i+1 OPT ( • • ■ 

(P m OPT (Q m AND Py,)) • ■ ■ )))\ D . 

In particular, for every n\ ■ ■ ■ n m G {0, l} m , given that v ni ... nm G 
[[(Qm AND P^)]d, Qm is a conjunction of triple patterns and var(Py,) C var(Q m ), 
we conclude that i/ ni ..,„ m G [[Py,]]r>. Hence, if cr ni ... nm is a truth assignment defined 
as a ni ,..n m (x) = v ni ... nm (IX) for every variable x in ip, then er TO1 .. -»i m satisfies tp. 
Thus, given that for every m • ■ • n m G {0, l} m we have that: 

Mm-m(?Jfj) = t'n 1 ...n l (?A J ) = Mm-n m (?^) « G [l,m] and j G [1, i], 
Mm-mO"^) = Vni-mOYk) = Mm-n m (?i^) « G [l,m] and fee [1, i— 1], 
^ ni ... ni (?Yj) =Mm-«m( ? ^) iG[l,m], 

we conclude that <p is valid. 

(=>) The proof that tp is valid implies /i G [[PJd is similar to the previous proof. 
A. 5 Proof of Theorem 5 

To prove Theorem 5, we need some technical lemmas. 
Lemma 3. 



(1) Letfii, ft, and ft be set of mappings, then Q\ x (ftxft) Q (ft x ft)\ft. 

f,2) Let ft and ft be set of mappings, then ft n ft = ft n (ft x ft). 

f3) Lei Pi, P2 be UNION-free graph pattern expressions and ft, ft set of map- 
pings such that ft C [[PiJd and ft C JP2J13. TTien ft 3X (ft x ft) = 
ft X ft. 

Proof: 

(1) Let (16ft n (ft n ft) then fj, = fj,i U fj,2 where fix G ft, and ^2 6 ft \ ft 
with /^i and /i2 compatible mappings. From /12 £ ft\ ft we have that /X2 G ft 
and for every mapping // G ft, /i2 is not compatible with //. Note that since 
Hi and fj,2 are compatible mappings, then fi = fj,i U/Lt2 £ ft x ft, Thus, given 
that /xa is not compatible with any mapping // £ ft, we conclude that /i is 
not compatible with any mapping pi! 6 ft. Thus, /1 G (ft x ft) \ ft. 

(2) First we show that ft \ ft C ft \ (ft x ft). Let (16 ft\ ft. Then /z G ft 
and for all // G ft, ^1 is not compatible with //. Let //' be any mapping in 
ft x ft , then fj," = fiiU fi2 with /Lti G ft , fJ,2 6 ft and then, since p is not 
compatible with /i2, necessarily /x is not compatible with pi!' . Then pi is not 
compatible with every fi" G ft x ft, and finally pi G ft s (ft X ft). Now 
we show that ft \ (ft x ft) C ft \ ft. Let /i G ft \ (ft x ft), then 
/i G ft and for every // G ft x ft, /u is not compatible with // . Suppose 
that /i is compatible with some G ft, then /iU(i" G ft x ft and 

is compatible with /i U pi" which is a contradiction with the assumption that 
H G ft n (ft x ft). Finally, pi G ft is not compatible with any fi" G ft and 
then /x G ft s ft. 

(3) By definition of W , we have that ft 3x| (ft x ft) = (ft x (ft x ft)) U 
(ft \ (ft x ft)). By associativity of AND, we have that ft x (ft x ft)) = 
((ft x ft) x ft), which in turn is equal to ft x ft since ft x ft = ft by 
Lemma 2 and the fact that ft C [[Pi Jo and Pi is a UNION-free expression. 
Furthermore, by property (2), we conclude that (ft \ (ft x ft)) = ft \ ft 
and, therefore, ft M (ft x ft) = (ft x (ft x ft)) U (ft n (ft x ft)) = 
(ft x ft) U (ft n ft) = ft IK ft. 

□ 

Lemma 4. Let P be a UNION-/ree graph pattern and IX G var(P) a variable of 
P. If there is a single occurrence of IX that appear in P but in no right hand size 
of any OPT subpattern of P, then IX G dom(/i) for all \i G [[P]]d- 

Proof: First note that the Lemma speaks of occurrence of a variable IX and not 
of the variable itself. The intuition of this lemma is that, if an occurrence of IX 
appear at least in one of the mandatory parts of P, then the variable must be 
bounded in all the mappings of [[P]]d. The formal proof is by induction in the 
construction of the pattern. 

(1) If P is a triple pattern and IX G var(P) then clearly IX G dom(/z) for all 
M 6[[P]] D . 

(2) Suppose P = (Pi AND P2) . Then if the occurrence of IX that concern us is 
in Pi then by induction hypothesis, IX G dom(/i) for all p G [[Pi]]r> and then 
IX G dom(/i) for all fj, G [[(Pi AND The case for P 2 is the same. 

(3) Suppose P = (Pi OPT P2), then the occurrence of IX that concern us is 
necessarily in Pi. By induction hypothesis IX G dom(/x) for all /1 G [[Pl]].d and 
then by the definition of OPT, IX G dom(/j) for all pi G [[(Pi OPT P 2 )}d- 

□ 



Lemma 5. Let D be an RDF database and P a well-designed graph pattern expres- 
sion. Assume that P' — (Pi OPT P2) is a sub-pattern of P and IX is a variable 
such that IX occurs in P2 and IX occurs in P outside P' . Then IX £ dom(/i) for 
every \i £ [[Pi]]d. 

Proof: Let P' — (Pi OPT P2) be a subpattern of a well designed graph pattern P 
such that IX £ var(Pi) and IX occurs outside P' . By the property of P of being 
well designed, we have that IX £ var(Pi). We concetrate now in subpatterns of Pi. 
Note that because IX £ var(P2) and by the hypothesis of P being well designed 
for every occurrence of IX in the right hand size of an OPT subpattern of Pi there 
is an occurrence of IX in the left hand size of the same OPT subpattern. The 
last statement imply that there is necessarily an occurrence of IX that is not at 
the right hand size of any of the OPT subpatterns of Pi, because if it were not 
the case Pi would have an infinite number of occurrence of IX (we would never 
stop applying the property of well designed pattern). Then applying Lemma 4 we 
obtain that for every /1 £ [[Pi]]r>, IX £ dom(^i), completing the proof. □ 

Lemma 6. Let D be an RDF database and P a well-designed graph pattern ex- 
pression. Suppose that P' is a sub-pattern of P and IX is a variable such that 
IX occurs in P' and IX occurs in P outside P' . Then IX £ dom(/i) for every 

Proof: By induction on P' . 

(1) If P' is a triple pattern t, then IX £ dom(/i) for every /i £ pjo. 

(2) Let P' = (Pi AND P 2 ). If IX £ var(Pi), then by induction hypothe- 
sis IX £ dom(/x) for every /1 £ JPiJd, and then IX £ Aam(v) for every 
v £ [[(Pi AND P 2 )]d. If IX £ var(P 2 ) the proof is similar. 

(3) Let P' = (Pi OPT P 2 ). If IX £ var(Pi) then by induction hypothe- 
sis IX £ dom(^i) for every /j, £ [[PiJd, and then IX £ dom(i') for ev- 
ery v £ [[(Pi OPT P 2 )]zj. If IX £ var(P 2 ), then given that P is a well- 
designed graph pattern expression and IX occurs in P outside P', we have that 
IX £ var(Pi). We conclude that IX £ dom(i/) for every v £ [[(Pi OPT P 2 )]] D 
as in the previous case. 

(4) Let P' = (Pi FILTER R). Then ?X £ var(Pi) and, thus, by induction hypothe- 
sis IX £ dom(/j) for every n £ [[Pi]]d- Now by definition [[(Pi FILTER R)j D C 
[[Pi]] D and, therefore, IX £ dom(z/) for every v £ [[(Pi FILTER R)\ D 

□ 

Proof of Theorem 5: We will prove that during the execution of Evalo(-), for 
every call Evalo(P, ft) it holds that Evalo(P, ft) = ft x [[P]]u. This immediatelty 
implies that Evaln(P) = [[P]d because Evalo(P) = Evalo(P, {^0})- 
The property trivially holds when ft — since Evalo(P, ft) — — x [[P]]d- 
Thus, we assume that ft 7^ 0. Now the proof goes by induction on P. 

— If P is a triple pattern t, then EvoId(P, ft) = ft m [tjo- 

— Suppose that P = (Pi AND P 2 ). Computing Evaln(P, ft) is equivalent to com- 
pute Evaln(P2, Evaln(Pi, ft)) then by induction hypothesis, Evaln(P, ft) = 
Eval D (P 2 ,ftn \PlId) = ftn [[Pi]] D m [[P 2 ]]d = Am [[(Pi AND P 2 )]] D . 

— Suppose that P = (Pi OPT P 2 ). Computing Evalo(P, ft) is equivalent to 
compute Evalo(Pi, ft) HX Evalo(P2, Evalo(Pi, ft)) and then by induction 
hypothesis Eval D (P,ft) = (ft X [[PiJd) (i? X [[Pi]] D x [[P 2 ]]d). Thus, we 
need to show that 



(fl x [[Pi]] D ) 1A (ft x [[Pi]]d x [[P 2 ]] D ) = x ([[Pi]]o X [[P 2 ]]d). 



First we show that fi x ([[Pi]\ D 3*1 [[P2]]d) C (fi x [[Pi]]d) 3X (fi x [[Pi]]d n 
[[P 2 ]]d). Let /i G fi x ([[Pi]]d 3X [Pain) then /j = /ij U fi 2 where ^i G fi, 
M2 G ([[-Pi]]d 3X [[PjJd), and fit, \i2 are compatible mappings. We consider two 
cases: 

(a) /i 2 G [[Pi]]d x [[P 2 ]] d . Then M G fi x ([Pi]d x [[P 2 ]]d) and, hence, 
by commutativity and associativity of the AND operator and Lemma 2, 
we have that G (fi x [[Pi]]d) x (fi x [[Pi]]d x [[P 2 ]]d) C (J? x 
[[PiJb) 3* (fi x [[Pi]]d x [[P 2 ]]d). 

(b) m G [Pi]d \ [[P 2 ]]c Then (i 6 fl m ([[Pi]] d \ [[P 2 ]] D ) C (fi n [[Pi]] d ) n 
\P%\d (by Lemma 3 (1)) and, thus, p £ (fi n [[Pi]]d) \ (fi x [[Pi]]d x 
[[P2]]d) (by Lemma 3 (2) and conmutativity and associativity of the AND 
operator). We conclude that /i G (fi x [[Pi]]d) ~M (fi x [[Pi]]d x [[PjJd). 

Now we show that (fi x [[Pi]]d) 3X (fi x [[Pi]]d x [[P 2 ]]d) C fi x 
(E-Pil-D 3>4 P^Jd)- By the definition of 3X|, it is sufficient to show that 
(fi x [[Pi]]d) x (fi x [[P^d x [[P 2 ]] D ) C fi x ([Pi]d 3X [[P2]]d), and that 
(fi x [[Pi]]r,) \ (fi x [PiIj D x [[P 2 ]] D ) C fi x ([[Pijo M [[P 2 ]]d): 

(a) By commutativity and associativity of the AND operator and Lemma 2, 
we have that (fi x [[Pi]]d) X (J? x [Pi]]d x [[P 2 ]]u) = fi x [[Pi]] D x 
[[P 2 ]]d C fi x ([[Pi]] D 3* [[P 2 ]]d). 

(b) By Lemma 3 (2), to show that (fi x [[Pi]]d) \ (fi x [[Pi]] d x [[P 2 ]]d) C 
fi x ([[Pi]]d 3X1 [[P 2 Jd) is equivalent to show that (fi x [[Pi]]d) \ [[P 2 ]]d C 
fi x ([[Pi]]d 3X [P 2 ]d). Let /i G (J? x [[Pi]]c) be such that for every 
fj! G [[P 2 ]]u, is not compatible with // . Then /i = /ii U /i 2 with /ii G fi, 
M2 £ [[Pi Jo, and fj,i, ^2 compatible mappings. Furthermore, for every // G 
[[P 2 ]]d, /iiU/x 2 is not compatible with fj,'. Suppose that /x 2 is not compatible 
with any // G [[P 2 ]] D , then /i 2 G [[PiJd \ \Pi\n Q {P^o ]X [[P 2 ]] D , 
and then fi — fii U /i 2 G fi x ([[Pi]]d 3>« [[P 2 ]]zj). Suppose now that /i 2 
is compatible with some v G [[P 2 ]]o, but \i\ is not compatible with v. 
Then there exists a variable IX G dom(/xi) such that IX G dom(i/) and 
^ti(?A) 7^ v(lX). Since fi2 is compatible with both and u, we have 
that IX dom(/i 2 ). This implies that IX is in the domain of a mapping 
in fi since /ii G fi and, hence, ?X is defined outside P = (Pi OPT P 2 ). 
Furthermore, ?A G var(P 2 ) since IX G Aamiv) and there exists a mapping 
lo = /i 2 G [[PiJzj such that ?X dom(u>), which contradicts Lemma 5. 
This conclude the proof of the inclusion (fl x [[Pi]]o) x (fl x [[Pi]]d x 
IPtlo) C X ([[Pijo >1 [[P^fl). 

— Suppose that P = (Pi FILTER P). Computing Evalo(P, fi) results in the set 
of mappings {fi G Evalo(Pi, fi) | M (= By induction hypothesis this set is 
equal to {/1 G fi x [[Pi]]d | a* N Thus, we need to show that this set is equal 
to fi x [[(Pi FILTER P)]] D . First, assume that v G fi x [[(Pi FILTER P)J D . 
Then v — v\\Jv-i with i^i G fi, V2 G [[(Pi FILTER P)]]d and v\. vi compatible 
mappings. Since V2 G [(Pi FILTER P)]]d we have that v 2 G [[Pi]]d and 
V2 \= P. Next we show that u \= R. By contradiction, assume that v \/= R. 
Then given that v 2 \= R and v = v\ VJv 2 , there is a variables IX G var(P) such 
that IX G dom(^) but IX G" dom(i^ 2 ). But this implies that IX G dom(^i) 
and, therefore, IX occurs outside P since v\ G fi. We conclude that IX occurs 
in P, IX occurs outside P and there exists a mapping u — V2 G [[P]]d such 
that ?X dom(tj), which contradicts Lemma 6. Thus, we conclude that v \= R 
and, therefore, v — v\ U V2 G {/i G fi x [[Pl]]d | |= P}. Second, assume 
that v G {/i G fi x [[Pl]]d | /U |= P}. Then |= P and ^ = 1/1 U i/ 2 with 



v\ G Q, v-z G [[Pi]]d and ui, V2 compatible mappings. Next we show that 
V2 \= R. By contradiction, assume that V2 \/= R. Then given that v \= R 
and v = v\ U V2, we have that there exists variable IX £ var(P) such that 
IX G dom(v) but ?X g 1 dom(i/ 2 ). But this implies that IX G dom(i/i) and, 
therefore, ?X occurs outside Pi since v\ G Q. We conclude that ?X occurs in 
Pi since var(P) C var(Pi), IX occurs outside Pi and there exists a mapping 
u — V2 G [[Pi]]n such that IX dom(tj), which contradicts Lemma 6. Thus, 
we conclude that v-z |= R and, therefore, V2 € [[(Pi FILTER P)]]d- Hence, we 
deduce that i/ = i/iUi/ 2 6fiN [[(Pi FILTER P)Jd- This concludes the proof 
of the theorem. 

□ 

A. 6 Proof of Proposition 2 

First we show that for every subpattern (Pi AND (P2 OPT P3)) of a well designed 
pattern P, it holds that (Pi AND (P 2 OPT P 3 )) = ((Pi AND P 2 ) OPT P 3 ). 
Proof: To simplify the notation we will suppose that pi G [[Pl]]d, p 2 G [[P 2 ]]d, 
and p 3 £ [[PsJd- 

- First [[(Pi AND (P 2 OPT P 3 ))] D C [[((Pi AND P 2 ) OPT P 3 )]] D . Let p G 
[[(Pi AND (P 2 OPT P 3 ))]] D = [[Pi]] D m ([P 2 ]]d IX [[P 3 ]] D ). Then p = pi U p' 
with pi and p' compatible mappings, and p' G [[P 2 ]]d 3*1 [[P3II15, depending on 
p' there are two cases: 

• If p' G [[P 2 ]]d n [[P 3 ]] d then p G [[Pi]]b M ([[P 2 ]]d n [[P 3 ]]d), and then 
p G ([[Pi]]d n [[P 2 ]]d) m [[Ps]]d C [[((Pi AND P 2 ) OPT P 3 )J D 

• If p' G [[P 2 ]]d N [[P 3 ]]d then p' G [[P 2 ]]o and is incompatible with every 
p 3 G [[P 3 ]]d, then p = [X\ U p' is incompatible with p 3 and then p G 
([[Pi]]o n [[P 2 ]] D ) x \Pz\d C ([[Pi]] d ix [[P 2 ]] D ) 3X [[P 3 Jd and then p G 
[[((Pi AND P 2 ) OPT P 3 )J D . 

- Now [[((Pi AND P 2 ) OPT P 3 )J D C [[(Pi AND (P 2 OPT P 3 ))]]d. Let u G 
[[(Pi AND P 2 ) OPT P 3 ))]] c = ([[PiJ D m [[P 2 ]] D ) 3< [[P 3 ]] D . There are two 

CcLSSSI 

• A* G ([[A]]d m [[P 2 ]]d) M [[P4d = ([[Pi]]d n [[P 3 ]] d ) m [[P 2 ]]d then 
At G [[((Pi AND P 2 ) OPT P 3 )]] d . 

• p G ([[Pi]]d m [[P2]]d) \ [[P3J15, then p = pi U p 2 with pi and p 2 com- 
patible mappings and for every p 3 , pi U p 2 is incompatible with p 3 . Sup- 
pose first that p 2 is incompatible with p 3 , then p 2 G ([P 2 ]]d \ [P3II-D Q 
IP 2 ]]d 3< [[Pair, and then pi U p 2 G [[Pi]]c m ([[P 2 ]]d 3X [[P 3 ]] d ) = 
[[(Pi AND (P 2 OPT P 3 ))]]d- Suppose now that pi is incompatible with 
p 3 , then there exists a variable ?X G dom(pi), IX G dom(p 3 ) such that 
pi(?X) / p 3 (?X). This last statement imply that ?A G var(Pi) n var(P 3 ) 
and then because P is well designed by Lemma 5 we obtain IX G dom(p 2 ) 
and because p 2 is compatible with pi we have that p 2 (?X) 7^ p 3 (?X). Fi- 
nally fi 2 G [[P 2 ]]d \ [[P 3 ]]d C [[P 2 ]] d ]X [[Ps]]d, and then p = pi U p 2 G 

[[(Pi N (P 2 }<P 3 ))]] D . 

□ 

Now we show that for every subpattern ((Pi OPT P 2 ) OPT P 3 ) of a well designed 
pattern P, it holds that ((Pi OPT P 2 ) OPT P 3 ) = ((Pi OPT P 3 ) OPT P 2 ). 
Proo/: 

- First [[((P OPT Pi) OPT P 2 )J D C [[((P OPT P 2 ) OPT Pi)]] D . Let p G 
[[((P OPT Pi) OPT P 2 )J D then p G ([[P]]d 3< [Pi Jo) M [[P 2 ]]d. Suppose that 
p G ([[P]]d ]X [[Pi]]d) x [[P2]]d, there are two cases: 



. /i G (IPjv N iX [[P 2 ]] D C ([[P]] C M [[P 2 ]M N Iftlfl C 

I((P OPT P 2 ) OPT Pi)] D . 
. /i G ([P] D x [[Pijc) N [[P 2 ]] D C ([[P]] c N [[P 2 ]] D ) x [[Px]] D , by proposi- 
tion 3 (1), then /x G [((P OPT P 2 ) OPT Pi)]] D . 
Suppose now that fi G ([[P]]d ^ [[Pl]]d) x [[P 2 ]]d There are two cases: (i assume 
^'G[[P]] D ^iG[[Pi]]i,,M2G[[P 2 ]] D ). 

• n G ([[P]]d n [[Pi]]d) x [[P2I-D, then fi = fi' U fii compatibles mappings, 
and for every /x 2 , // U /ii is incompatible with /x 2 . If // is incompatible 
with H2 then G [[P]]d x [[P 2 ]]d and then // U ^i G ([[P]]d x [[P 2 ]]d) x 
[[PiJd) and then fi G [[((P OPT P 2 ) OPT Pi)]r>. Suppose that fn is 
incompatible with /i 2 , then there is ?A' such that fii(?X) ^ fiiilX ). Then 
?X G var(Pi) n var(P 2 ) and because the whole pattern is well designed, 
by Lemma 5 we obtain that IX G \i and by p! compatible with \i\ we 
obtain that /j/(?X) ^ /x 2 (?X), and then // is incompatible with pi 2 . Then 
H G [[((P OPT P 2 ) OPT Pi)] D . 

• n G ([P]d x [Pi Jo) \ [[P 2 ]]d, then ^ G [[PJd and is such that for all fn 
and for all ^i 2 , /i is incompatible with fi\ and /i 2 , and then n G ([[Pflij x 
[[P 2 ]]d) x [Pi] D C [[((P OPT P 2 ) OPT Pi)]] D . 

- Now we show that [[((P OPT P 2 ) OPT Pi)]] D C [[((P OPT Pi) OPT P 2 )]] D . 
Let M G [[((P OPT Pi) OPT P 2 )]]d then /x G ([P]d 5< [[P 2 ]M 3X [[Pi]]d. 
(again i assume G [[P]]d,Mi <= [Pi]d,/X2 G [[P 2 ]]d). Suppose that fj, G 
([[P]]d 1>< [[P 2 ]]d) n [[Pi]]d, there are two cases: 

• At £ (I^Bd x P^IM x [[Pi]]d C [[((P OPT Pi) OPT P 2 )]] D . 

• li £ ([f ]d s [[P 2 ]] D ) m [[Pi] d C ([[P]] d m [[Pi]M x [P 2 ] D by prop. 3 (1) 
and then it G [[((P OPT Pi) OPT P 2 )J 

Suppose now that /i G ([P]]d ~M [[P 2 ]]d) x 

• /i G ([[P]]d n [[P 2 ]]d) \ [[Pi]]d, then fi = fx' U fi? compatible mappings 
such that for every fii G [[Pi]]d, A 4 ' U /i 2 is incompatible with jtti. If // 
is incompatible with /ii then // G [[P]d x [[PiJd and then fx' U /i 2 G 
(Pic ^ I^iId) ^ C ([[P]]d x [[Pi]M 3< [[P 2 ]] D and then li G 
[[((P OPT Pi) OPT P 2 )]]d. If is incompatible with /xi then there exists 
a variable ?A G dom(ixi) n dom(/i 2 ) such that /ii(?X) ^ /x 2 (?X). Then 
?A G var(Pi) n var(P 2 ) and because the whole pattern is well designed, 
by Lemma 5 we obtain that IX G // and by // compatible with /i 2 we 
obtain that /jf(?X) fii(7X), and then // is incompatible with jtti. Then 
M G [[((P OPT Pi) OPT P 2 )]] D . 

• /i G (Plfl s [P 2 ]d) x [Pi] D C ([[P]]d n [[Pi]] d ) x [[P 2 ]]d C 
[[((P OPT Pi) OPT P 2 )]] D 

□ 

To finish the proof we must show that replacing the respective equivalences do 
not affect the property of P of being well designed. Let (Pi AND (P 2 OPT P 3 )) 
be a subpattern of P. Well designed says that, if a variable IX occurs outside 
(P 2 OPT P3) and inside P3 then it occurs in P 2 . Suppose that this is the case and 
that IX occurs outside (Pi AND (P 2 OPT P 3 )), then because ?A occurs in ?P 2 
then IX occurs in (Pi AND P 2 ) and the pattern P' obtained from P by replacing 
(Pi AND (P 2 OPT P 3 )) by (Pi AND P 2 ) OPT P 3 )) is well designed. Suppose now 
that IX occurs in Pi but does not occur outside (Pi AND (P 2 OPT P 3 )), then 
IX does not occur outside ((Pi AND P 2 ) OPT P3) and then the pattern obtained 
from P is well designed. 

The proof for P' = ((Pi OPT P 2 ) OPT P3) is similar. There are various cases for 
variables occurring inside P 2 , P3. 



D ■ 

PiJd, there are two cases: 



— IX occurs in P 2 and in P3, 

— IX occurs in P 2 and outside P' but not in P3, 

— IX occurs in P3 and outside P' but not in P2, 

in all cases because P is well designed IX occurs in Pi and then the pattern 
obtained from P replacing P' by ((Pi OPT P 3 ) OPT P 2 ) is well designed. 

A. 7 Proof of Theorem 6 

To prove Theorem 6 we use the following Lemma. In the Lemma we use rewriting 
concepts and results (see [2]). 

Lemma 7. Let us consider the theory E formed by the equations of associativity 
and commutativity for AND (Proposition 1), and equation 

((X OPT Y) OPT Z) = ({X OPT Z) OPT Y) 

Then the rule 

(X AND {Y OPT Z)) — > {(X AND Y) OPT Z) (7) 

is E -terminating and E-confiuent in the set of well designed patterns, and hence 
has E-normal forms in the set of well designed patterns. 

Proof: 

(1) First we prove that rule (7) is terminating. Consider the measure 

m(P) : number of OPT inside AND -trees in the parsing of P. 

Then clearly the theory E keeps m(P) constant. Let P' and P" be the left and 
right hand side in rule (7) respectively, then m(P') > m(P"). Hence successive 
application of rule (7) must terminate. 

(2) Now we prove that rule (7) is P-locally confluent. Note that the only critical 
pair (see [2]) is: ((Pi OPT P 2 ) AND (P 3 OPT Pa)) Then it only left to check 
that both applications of rule (7) 

(((Pi OPT P 2 ) AND P 3 ) OPT Pa) 

and 

(((P 3 OPT Pa) AND Pi) OPT P 2 ) 
can be rewritten to a common term using the axioms of E and the rule (7): 

(((Pi OPT P 2 ) AND P 3 ) OPT Pa) I ((Ps AND (Pi OPT P 2 )) OPT Pa) 

3 (((P 3 AND Pi) OPT P 2 ) OPT Pa) 
I (((Pi AND P 3 ) OPT P 2 ) OPT Pa) 

(((P 3 OPT Pa) AND Pi) OPT P 2 ) I ((Pi AND (P 3 OPT Pa)) OPT P 2 ) 

^ (((Pi AND P a ) OPT P 4 ) OPT P 2 ) 
I (((Pi AND P 3 ) OPT P 2 ) OPT P 4 ) 

□ 

Theorem 6 follows from the existence of E normal forms for rule (7), and the 
application of (7) and E identities to well designed graph patterns. 



